Document Image Binarization
نویسنده
چکیده
Principal stage of the document image analysis procedure is the binarization, according to which the pixels are classified into text and background. It is a crucial stage that can affect further stages including the final character recognition stage. This thesis is focused on document image binarization, including both binarization techniques and evaluation methodologies. Specifically, according to the developed performance evaluation methodologies, the pixel-level ground-truth image is constructed using a semi-automatic procedure based on the edges and the skeleton of the characters. The new measures use (a) weights that start from the ground truth contour and (b) the local stroke width to limit the weights close to the character areas and to properly normalize those weights. Experimental results prove the validity and effectiveness of the new measures for document images, while other measures concern the image or signal processing area in general. Concerning binarization techniques, some improvements were initially proposed for the well-known technique of Yang&Yan. To further enhance the quality of binarization and be more robust against different types of degradations (e.g. faint characters, bleed-through and non-uniform background), a new binarization technique was developed that was based on background estimation and on the combination of selected global and local binarization techniques. Additionally, a binarization technique was developed for the binarization of the text areas captured from video content. This technique is also based on the Yang&Yan binarization technique and sets low and high values in its global parameter for the inside and outside area of the text. Initially, the definition of the text areas is based on the baselines of the text and at the final stage the text areas are better defined by the convex hulls of neighbouring textual components. Furthermore, through the document image binarization contests that we organized, a publicly available benchmark has been created that aids in the development of document image binarization techniques and evaluation methodologies.
منابع مشابه
Document Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملAncient Document Images Enhancement Using Phase Based Binarization
In this paper, we present a phase-based binarization model for degraded document images, also a post processing method that can improve any binarization method and a ground truth generation tool. Usually, many binarization techniques are implemented in the literature for different types of binarization problems. It include an adaptive image contrast based document image binarization technique t...
متن کاملDocument Image Binarization Using Threshold Segmentation
Binarization is process to generate binary image from document image. Document image binarization has already under research from past many years, and many binarization algorithms have been proposed for different types of degraded document images. Document image Binarization is very popular to upgrade old handwritten and machine printed documents. Still to recover degraded document is very tedi...
متن کاملForeground-Background Regions Guided Binarization of Camera-Captured Document Images
Binarization is an important preprocessing step in several document image processing tasks. Nowadays handheld camera devices are in widespread use, that allow fast and flexible document image capturing. But, they may produce degraded grayscale image, especially due to bad shading or non-uniform illumination. State-of-the-art binarization techniques, which are designed for scanned images, do not...
متن کاملA Survey on Degraded Document Image Binarization Techniques
the method of segmentation in the image binarization technique is the major technique used for the separation of pixel values into dual collections, black as foreground and white as background. The degraded images of a document are segmented by using the image binarization technique in order to acquire the clear images exact to that of the original images of documents. Thresholding process is t...
متن کاملA Combination of Laplacian Energy, Global and Adaptive Techniques for Degraded Document Image Binarization
Many document image binarization algorithms have previously been proposed for enhancing the performance of degraded document image binarization. This paper reviews algorithms for document image binarization. All of the algorithms have some advantages and disadvantages. To remove the drawbacks in this paper a combined approach is proposed that first combines different types of global and local t...
متن کامل